-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] add docs for async search #53675
Conversation
Relates to elastic#49091
Pinging @elastic/es-docs (>docs) |
Pinging @elastic/es-search (:Search/Search) |
path = path.replace('<', '%3C').replace('>', '%3E') | ||
path = path.replace('{', '%7B').replace('}', '%7D') | ||
path = path.replace('|', '%7C') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping this would work but it does cause problems. I need to see if this escaping is needed, to my mind it isn't but I may be wrong. In that case I need to only escape unless curly brackets are part of an expression like ${expression}, which I don't look forward to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out this is easy to address, there were only a couple of tests where we were using unescaped |
which we can manually escape instead.
@@ -41,6 +45,15 @@ | |||
"description":"The number of shard results that should be reduced at once on the coordinating node. This value should be used as the granularity at which progress results will be made available.", | |||
"default":5 | |||
}, | |||
"pre_filter_shard_size":{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi I was wondering if you left this out on purpose. It is settable hence I added it, let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably prevent users to set this and assume that the can_match phase always runs on async_search. That means that we should throw an error if the value is set in the rest API but that's out of scope for this pr. +1 to remove it from the spec here though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have a look at disallowing to set this parameter in a followup
I left some comments to help fix wording, capitalization, and a few other things. Aside from those, this looks pretty good. I do wonder if the API docs might be easier to parse if they were formatted similar to the API reference here: Some of the sections regarding parameters could probably benefit from that structure. However, that would likely involve duplicating a lot of content from the current search API docs, which need work themselves. I'm okay with this for now. |
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @javanna , I left some comments regarding the documentation of the different options.
<2> Partial aggregations results, coming from the shards that have already | ||
completed the execution of the query. | ||
|
||
NOTE: When results are sorted by a numeric field, shards get sorted based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this note near to the submit query in order to explain why the example uses a sort ?
available before the timeout expires, otherwise the currently available results | ||
will be returned once the timeout expires. | ||
|
||
The `keep_alive` parameter, which defaults to `5d` (five days), specifies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keep_alive
for the get api is a bit different since it allows to extend/change the expiration time for an existing id. Maybe good to make the distinction here since the default for get is to keep the current expiration time.
`1` second. | ||
|
||
The submit async search API supports the same <<search-search-api-query-params,parameters>> | ||
as the search API, though some have different default values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can list the ones that can be changed and omit the other. batched_reduce_size
, request_cache
and max_concurrent_shard_requests
+ the specific options (wait_for_completion, keep_alive) should be enough for this API. I think we said that we want to remove pre_filter_shard_size
so I'd remove it from here, same for ccs_minimize_roundtrips
which sounds like it could be changed but cannot.
So to be clear, a concrete list that presents all these options and then we can add a link to the search source documentation that should be unchanged from normal search request ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batched_reduce_size
is also important to document since that's the granularity at which partial results will be made available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I was planning on looking back to this once we disallow setting pre_filter_shard_size and maybe others, but I can address the docs in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add a link to the search source documentation
there is already a link. Why document max_concurrent_shard_request here? The default is the same as for search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I thought we would have the exhaustive list of options here rather than linking to the entire search request option. We only accept a subset of it so better to be explicit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing is that there's a ton of options that can be set to the search request. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-search-api-query-params . I thought it's better to just link to search rather than duplicate its docs, and mention the special cases or important aspects only in the async search docs. Would you rather list all of the search options also here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for a simple link but we should at least document batched_reduce_size
explicitly here for the reason exposed above. We can also add the list of options that are not accepted here (scroll
, ccs_minimize_roundtrips
and pre_filter_shard_size
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that's what I had in mind. batched_reduce_size makes a lot of sense to explain, I already did, I will push shortly
@jimczi I pushed some updates, would you like to have another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two minor comments but the change looks good to me.
==== Get async search | ||
|
||
The get async search API retrieves the results of a previously submitted | ||
async search request given its id. The access to the results of a specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only when security is enabled that we restrict the access. We should make the distinction here.
and their partial results are also included in the response. | ||
<2> Partial aggregations results, coming from the shards that have already | ||
completed the execution of the query. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an explanation for expiration_time_in_millis
, is_partial
and is_running
? The last two are important since they determine how to interpret the search response (partial or not) and if the search is still running.
Relates to #49091 Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Relates to #49091